Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wip: add support for node extraction -> cluster metadata #26

Merged
merged 2 commits into from
Feb 25, 2024

Conversation

vsoch
Copy link
Member

@vsoch vsoch commented Feb 24, 2024

Problem: we need a extract metadata for nodes and then parse into a cluster graph
Solution: create a compspec create nodes subcommand.

This is for the registration step of a cluster to rainbow. Information about the design / idea planning is here.

In this PR I am adding a ClusterGraph, which still needs work to improve the output to easily map into a JGF (right now it has elements that can support any type that need further parsing). I am also generalizing the idea of plugins more, so we will have extractors and converters (that run create) but I need to finalize the design for the latter, right now the create commands are very separate. I am opening the PR sooner than later in case my computer explodes. A few problems I have run into is that NFD does not have cpu counts, let along physical vs. logical. This information is in /proc/cpuinfo for x86 but not arm. We also do not have a way to get socket -> core mapping. So likely we do need to add the hwloc extractor, and provide an automated build for doing that since it requires hwloc on the system. I will put some thought into this.

Dinosaur TODO

Update for the above - hwloc won't be possible soon, there is a bug in the library. I've opened an issue, and I'll remove the bullets to #19

image

Update: JGF looks good! It was... a pointer bug!

image

Problem: we need a extract metadata for nodes and then parse into a cluster graph
Solution: create a compspec create nodes subcommand.

In this PR I am adding a ClusterGraph, which still needs work to improve the output
to easily map into a JGF (right now it has elements that can support any type
that need further parsing). I am also generalizing the idea of plugins more, so
we will have extractors and converters (that run create) but I need to finalize
the design for the latter, right now the create commands are very separate. I
am opening the PR sooner than later in case my computer explodes. A few problems
I have run into is that NFD does not have cpu counts, let along physical vs.
logical. This information is in /proc/cpuinfo for x86 but not arm. We also
do not have a way to get socket -> core mapping. So likely we do need to add
the hwloc extractor, and provide an automated build for doing that since
it requires hwloc on the system. I will put some thought into this.

Signed-off-by: vsoch <[email protected]>
@vsoch vsoch force-pushed the refactor-plugin-design-add-nodes branch from ca04705 to 11118b1 Compare February 25, 2024 00:05
Problem: we did not have a way to define a creator as a plugin.
Solution: add a plugin interface to create. I originally was going
to create separate plugin types, but I like the idea that one plugin
family can decide to define both easily. When this is refactored to
have a more "register" design (to make it flexible to changing the set
available) it will be nice to provide Create/Extract from the same
interface and keep the number of interfaces / functions for them
minimal. I was going to add hwloc now, but there seems to be a
bug so we will need to move forward prototyping with the current
proxy for nodes, which is just using the go runtime package. We
obviously need to improve upon this.

Signed-off-by: vsoch <[email protected]>
@vsoch vsoch force-pushed the refactor-plugin-design-add-nodes branch from c5d3e13 to 0061aac Compare February 25, 2024 03:41
@vsoch
Copy link
Member Author

vsoch commented Feb 25, 2024

This is ready to go.

  • We have a new "creator" interface for plugins, for which we currently support compatibility artifacts and cluster JGF (v2) graphs. See the note here for details about that design decision.
  • The extraction of single nodes (compspec extract) and then generation of the JGF graph also looks good (comspec create nodes)

The current issues with the second are that node feature discovery is not giving us the topology of sockets -> cores, nor is it able to give us basic counts of virtual vs physical cores. We will need hwloc for that (I hope/think). I can't add the bindings yet because there is a bug, but I wrote the Makefile (added here) that will make that possible.

This should be enough to prototype with rainbow, so I'm going to merge. Will pick up more after dinner.

@vsoch vsoch merged commit 2380578 into main Feb 25, 2024
@vsoch vsoch deleted the refactor-plugin-design-add-nodes branch February 25, 2024 03:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant